Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

نویسندگان

  • Wajih Halim Boukaram
  • George M. Turkiyyah
  • Hatem Ltaief
  • David E. Keyes
چکیده

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Batched matrix computations on hardware accelerators based on GPUs

Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that we call batched factorizations....

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development...

متن کامل

Image Compression Method Based on QR-Wavelet ‎Transformation

In this paper, a procedure is reported that discuss how linear algebra can be used in image compression. The basic idea is that each image can be represented as a matrix. We apply linear algebra (QR ‎factorization and wavelet ‎transformation ‎algorithm‏s) on this matrix and get a reduced matrix out such that the image corresponding to this reduced matrix requires much less storage space than th...

متن کامل

Fourth-order Tensors with Multidimensional Discrete Transforms

The big data era is swamping areas including data analysis, machine/deep learning, signal processing, statistics, scientific computing, and cloud computing. The multidimensional feature and huge volume of big data put urgent requirements to the development of multilinear modeling tools and efficient algorithms. In this paper, we build a novel multilinear tensor space that supports useful algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.05141  شماره 

صفحات  -

تاریخ انتشار 2017